智能论文笔记

An approach to robust ICP initialization

Alexander Kolpakov , Michael Werman

分类：计算机视觉

2022-12-10

In this note, we propose an approach for initializing the Iterative Closest Point (ICP) algorithm that allows us to apply ICP to unlabelled point clouds that are related by rigid transformations. We also give bounds on the robustness of our approach to noise. Numerical experiments confirm our theoretical findings.

translated by 谷歌翻译

DecisioNet -- A Binary-Tree Structured Neural Network

Noam Gottlieb , Michael Werman

分类：计算机视觉 | 机器学习

2022-07-03

深神经网络（DNN）和决策树（DTS）都是最先进的分类器。DNN由于其表示性学习能力而表现良好，而DTS在计算上是有效的，因为它们沿着一条途径（根到叶子）进行推理，该推理取决于输入数据。在本文中，我们介绍了二元树结构化神经网络的决策者（DN）。我们提出了一种系统的方法，将现有DNN转换为DN，以创建原始模型的轻量级版本。Decisionet竭尽全力 - 它使用神经模块来执行代表性学习，并利用其树结构仅执行一部分计算。我们评估了各种DN体系结构，以及他们在FashionMnist，CIFAR10和CIFAR100数据集上的相应基线模型。我们表明，DN变体具有相似的精度，同时显着降低了原始网络的计算成本。

translated by 谷歌翻译

DeePaste -- Inpainting for Pasting

Levi Kassel Michael Werman

分类：计算机视觉

2021-12-20

监督学习培训的挑战之一是需要采购大量标记数据。解决这个问题的众所周知的方法是用副本粘贴方式使用合成数据，以便我们切割物体并将它们粘贴到相关的背景上。粘贴对象天真地导致伪像导致模型对实际数据产生差的结果。我们提出了一种在不同背景上干净地粘贴对象的新方法，以便在实际数据上创建的数据集具有竞争性能。主要重点是使用染色处理粘贴物体边界。我们在实例检测和前景分段上显示最先进的结果

translated by 谷歌翻译

General Deformations of Point Configurations Viewed By a Pinhole Model Camera

Yirmeyahu Kaminski , Michael Werman

分类：计算机视觉

2015-05-29

本文是从运动问题的以下非刚性结构的理论研究。可以从参数变形点集的单眼视图计算什么？我们对具有校准和未校准相机的仿射和多项式变形来对待该问题的各种变化。我们表明，通常需要至少三个具有准相同的两种变形的图像，以便具有点结构的有限溶液并计算一些简单的示例。

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Control and Dynamic Motion Planning for a Hybrid Air-Underwater Quadrotor: Minimizing Energy Use in a Flooded Cave Environment

Ilya Semenov , Robert Brown , Michael Otte

分类：机器人

2023-01-03

We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.

translated by 谷歌翻译

Flexible Supervised Autonomy for Exploration in Subterranean Environments

Harel Biggie , Eugene R. Rush , Danny G. Riley , Shakeeb Ahmad , Michael T. Ohradzansky , Kyle Harlow , Michael J. Miles , Daniel Torres , Steve McGuire , Eric W. Frew

分类：机器人

2023-01-02

While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译

Urban Visual Intelligence: Studying Cities with AI and Street-level Imagery

Fan Zhanga , Arianna Salazar Mirandaa , Fábio Duarte , Lawrence Vale , Gary Hack , Yu Liu , Michael Batty , Carlo Ratti

分类：计算机视觉

2023-01-02

The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.

translated by 谷歌翻译

Logic Mill -- A Knowledge Navigation System

Sebastian Erhardt , Mainak Ghosh , Erik Buunk , Michael E. Rose , Dietmar Harhoff

分类：自然语言处理

2022-12-31

Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.

translated by 谷歌翻译